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Abstract. In this work, we develop a reduced-basis approach for the efficient computation 
of parametrized expected values, for a large number of parameter values, using the control variate 
method to reduce the variance. Two algorithms are proposed to compute online, through a cheap 
reduced-basis approximation, the control variates for the computation of a large number of expecta- 
tions of a functional of a parametrized Ito stochastic process (solution to a parametrized stochastic 
differential equation). For each algorithm, a reduced basis of control variates is pre-computed of- 
fline, following a so-called greedy procedure, which minimizes the variance among a trial sample of 
the output parametrized expectations. Numerical results in situations relevant to practical appli- 
cations (calibration of volatility in option pricing, and parameter-driven evolution of a vector field 
following a Langevin equation from kinetic theory) illustrate the efficiency of the method. 

1. Introduction 

This article develops a general variance reduction method for the many-query 
context where a large number of Monte-Carlo estimations of the expectation E(Z A ) 
of a functional 

Z x =g x (X x )- ( T f\s,X*)ds (1.1) 
Jo 

of the solutions {X x ,t £ [0,T]J to the stochastic differential equations (SDEs): 

X? = x+ f b x (s,X x )ds + f a x (s,X x )dB s (1.2) 
Jo Jo 

parametrized by A € A have to be computed for many values of the parameter A. 

Such many-query contexts are encountered in finance for instance, where pric- 
ing options often necessitates to compute the price E(Z A ) of an option with spot 
price X x at time t in order to calibrate the local volatility er A as a function of a 
(multi-dimensional) parameter A (that is minimize over A, after many iterations of 
some optimization algorithm, the difference between observed statistical data with 
the model prediction). Another context for application is molecular simulation, for 
instance micro-macro models in rheology, where the mechanical properties of a flow- 
ing viscoelastic fluid are determined from the coupled evolution of a non-Newtonian 
stress tensor field E (Z A ) due to the presence of many polymers with configuration 
X x in the fluid with instantaneous velocity gradient field A. Typically, segregated 
numerical schemes are used: compute X x for a fixed field A, and then compute A for 
a fixed field E(Z A ). Such tasks are known to be computationally demanding and the 
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use of different variance reduction techniques to alleviate the cost of Monte-Carlo 
computations in those fields is very common (see [2j [19l [22] [3] for instance). 

In the following, we focus on one particular variance reduction strategy termed 
the control variate method |10 [ [2T | [20] . More precisely, we propose new approaches 
in the context of the computation of E (Z A ) for a large number of parameter values 
A, with the control variate method. In these approaches, the control variates are 
computed through a reduced-basis method whose principle is related to the reduced- 
basis method |17l IT8l l23l 2] [5] previously developed to efficiently solve parametrized 
partial differential equations (PDEs). Following the reduced-basis paradigm, a small- 
dimensional vector basis is first built offline to span a good linear approximation 
space for a large trial sample of the A-parametrized control variates, and then used 
online to compute control variates at any parameter value. The offline computations 
are typically expensive, but done once for all. Consequently, it is expected that the 
online computations (namely, approximations of E (-^ A ) for many values of A) are very 
cheap, using the small-dimensional vector basis built offline for efficiently computing 
control variates online. Of course, such reduced-basis approaches can only be efficient 
insofar as: 

1. online computations (of one output E(Z A ) for one parameter value A) are 
significantly cheaper using the reduced-basis approach than without, and 

2. the amount of outputs E(Z A ) to be computed online (for many different 
parameter values A) is sufficient to compensate for the (expensive) offline 
computations (needed to build the reduced basis). 

In this work, we will study numerically how the variance is reduced in two examples 
using control variates built with two different approaches. 

The usual reduced-basis approach for parametrized PDEs also traditionally fo- 
cuses on the certification of the reduction (in the parametrized solution manifold) 
by estimating a posteriori the error between approximations obtained before/after 
reduction for some output which is a functional of the PDE solution. Our reduced- 
basis approach for the parametrized control variate method can also be cast into a 
goal-oriented framework similar to the traditional reduced basis method. One can 
take the expectation E(Z A ) as the reduced-basis output, while the empirically esti- 
mated variance VarM {Z x ) serves as a computable (statistical) error indicator for the 
Monte-Carlo approximations Em (Z x ) of E (Z x ) in the limit of large M through the 



Central Limit Theorem (see error bound (2.4 1 in Section 2.1 



In the next Section [2] the variance reduction issue and the control variate method 
are introduced, as well as the principles of our reduced-basis approaches for the com- 
putation of parametrized control variates. The Section [3] exposes details about the 
algorithms which are numerically applied to test problems in the last Section [4] 

The numerical simulations show good performance of the method for the two test 
problems corresponding to the applications mentionned above: a scalar SDE with 
(multi-dimensional) parametrized diffusion (corresponding to the calibration of a local 
volatility in option pricing), and a vector SDE with (multi-dimensional) parametrized 
drift (for the parameter-driven evolution of a vector field following a Langevin equation 
from kinetic theory). Using the control variate method with a 20-dimensional reduced 
basis of (precomputed) control variates, the variance is approximatively divided by 
a factor of 10 4 in the mean for large test samples of parameter in the applications 
we experiment here. As a consequence, our reduced-basis approaches allows to ap- 
proximately divide the online computation time by a factor of 10 2 , while maintaining 
the confidence intervals for the output expectation at the same value than without 
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reduced basis. 

This work intends to present a new numerical method and to demonstrate its 
interest on some relevant test cases. We do not have, for the moment, a theoretical 
understanding of the method. This is the subject of future works. 

2. The variance reduction issue and the control variate method 

2.1. Mathematical preliminaries and the variance reduction issue 

Let (B t £M. d ,t £ [0,T]) be a d-dimensional standard Brownian motion (where d is 
a positive integer) on a complete probability space (CI, JF,P), endowed with a filtration 
(T t: t £ [0,T]). For any square-integrable random variables X,Y on that probability 
space (f2,.F,P), we respectively denote by K(X) and Var (AT) the expected value and 
the variance of X with respect to the probability measure P, and by Cov(X;Y) the 
covariance between X and Y . 

For every AeA (A being the set of parameter values), the Ito processes 
(X x £M. d ,t£ [0,T]) with deterministic initial condition x£M. d are well defined as the 
solutions to the SDEs ( |1.2) under suitable assumptions on b x and <r A , for instance pro- 
vided b x and o x satisfy Lipschitz and growth conditions [13]. Let (X x ) be solutions to 
the SDEs, and f x , g x be measurable functions such that Z x is a well-defined integrable 
random variable (Z x £ Lp(f2)). Then, Kolmogorov's strong law of large numbers holds 
and, denoting by (m=l,...,M) M independent copies of the random variables 
Z x (for all positive integer M), the output expectation E(Z A ) = J Q Z x d¥ can be ap- 
proximated (almost surely) by Monte-Carlo estimations of the form: 

i M 

m— 1 

Furthermore, assume that the random variable Z x is square integrable (Z x £ Lp(Cl)) 
with variance Var(Z A ), then an asymptotic error bound for the convergence occuring 



in (2.1| is given in probabilistic terms by the Central Limit Theorem as confidence 



intervals: for all a > 0, 

v(\E M (Z x )-E(Z x )\<a X l^^ \ ^ / '-^<l.r. (2.2;. 




In terms of the error bound (2.2 1, an approximation Em (^ A ) of the output E (Z A ) 
is thus all the better, for a given M, as the variance Var (Z A ) is small. In a many- 
query framework, the computation of approximations ( |2.1[ | for many outputs E (Z A ) 
(corresponding to many queried values of the parameter A £ A) would then be all the 
faster as the variance Var (Z x ) for some A € A could be decreased from some knowledge 
acquired from the A £ A computed beforehand. This typically defines a many-query 
setting with parametrized output suitable for a reduced-basis approach similar to the 
reduced-basis method developped in a deterministic setting for parametrized PDEs. 



In addition, the convergence (2.1 1 controlled by the confidence intervals (2.2 I can 
be easily observed using computable a posteriori estimators. Indeed, remember that 
since the random variable Z x has a finite second moment, then the strong law of large 
numbers also implies the following convergence: 

Var M (Z A ) : =E M ((Z A -E M (Z A )) 2 )^^Var(Z A ) . (2.3) 



Combining the Central Limit Theorem with Slutsky theorem for the couple of Monte- 
Carlo estimators (Em (Z x ) , Varjvr (Z x )) ( see f° r instance j5], exercise 7.2.(26)), we 
obtain a fully computable probabilistic (asymptotic) error bound for the Monte-Carlo 
approximation ( |2.1[ | of the output expectation: for all a > 0, 



It is exactly the purpose of variance reduction techniques to reduce the so-called 
statistical error appearing in the Monte-Carlo estimation of the output expectation 



E(Z A ) through the error bound (2.2 1. And this is usually achieved in practice by 



using the (a posteriori) estimation (2.4 I 



Remark 2.1 (SDE discretization and bias error in the output expectation). 
In practice, there is of course another source of error, coming from the time- 



discretizations of the SDE (1.2 I and of the integral involved in the expression for Z x . 

In the following (for the numerical applications), we use the Euler-Maruyama 
numerical scheme with discretizations = to <t\ < ■■■ <tjq~T (N (zN) of the time 
interval [0,T] to approximate the ltd process (X x ): 

= Xri-l + \tn~ fr A (^i!-l;^n-l) + \/Rn — *n- 1 (*n- 1 ) X n-l) ! 

X x = x, 

where {G n , n = 0, ...,N — 1} is a collection of N independent d-dimensional normal 
centered Gaussian vectors. It is well-known that such a scheme if of weak order one, so 
that we have a bound for the bias due to the approximation of the output expectation 
E(Z X ) by E(Z X ) (where Z x is a time-discrete approximation for Z x computed from 

(X x ) with an appropriate discretization of the integral J Q T f x (s,X x )ds): 
|E (Z x ) E (Z x ) | ^ O (mnfitn ~ tn-l\) 

The approximation of the output E(Z A ) by Em (Z x ) thus contains two types of errors: 

• first, a bias E(Z A — Z X ) due to discretization errors in the numerical integra- 
tion of the SDE ( |1.2) and of the integral involved in Z x , 

• second, a statistical error of order yjv&r (Z A ) ~]~M in the empirical Monte- 
Carlo estimation Fim(Z x ) of the expectation E(Z A ). 

We focus here on the statistical error. 

2.2. Variance reduction with the control variate method 

The idea of control variate methods for the Monte-Carlo evaluation of E (Z x ) is 
to find a so-called control variate Y x (with Y x £Lp(il)), and then to write: 

E(Z x )=E(Z x -Y x )+E(Y x ) , 

where E(V A ) can be easily evaluated, while the expectation E(Z X — Y x ) is approx- 
imated by Monte-Carlo estimations that have a smaller statistical error than direct 
Monte-Carlo estimations of E(Z A ). In the following, we will consider control variates 
Y x such that E(Z X ) =E(Z X -Y X ), equivalently 



E(Y A )=0. 
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The control variate method will indeed be interesting if the statistical error of the 
Monte-Carlo estimations Em(Z x — Y x ) is significantly smaller than the statistical 
error of the Monte-Carlo estimations Em(Z x ). That is, considering the following 
error bound given by the Central Limit Theorem: for all a > 0, 



\E M (Z X -Y X )-E(Z X 



<a 



Var(Z A -F A ) \ M - 



M 



-x 2 /2 



/2tt 



-dx. 



(2-5) 



the Monte-Carlo estimations Em(Z x — Y x ) will indeed be more accurate approxima- 
tions of the expectations E(Z A ) than the Monte-Carlo estimations Fjm(Z x ) provided: 

Var(Z A ) >Var(Z A -Y A ) . 

Clearly, the best possible control variate (in the sense of minimal variance) for a 
fixed parameter A S A is: 



Y X = Z X -E(Z X ) , 



(2-6) 



since we then have Var(Z A — F A ) =0. Unfortunately, the result E(Z A ) itself is nec- 
essary to compute Y x as Z x — E(Z A ). 

In the following, we will need another representation of the best possible con- 
trol variate Z x — E(Z A ). Under suitable assumptions on the coefficients b x and a x 
(for well-posedness of the SDE), plus continuity and polynomial growth conditions 
on f x and g x , let us define u x (t,y), for (t,y) € [0,T] x M d , as the unique solution 
u x (t,y) e C 1 ([0,T],C 2 (R d )) to the backward Kolmogorov equation ( |2~7| satisfying 
the same polynomial growth assumptions at infinity than f x and g x (for instance, see 
Theorem 5.3 in [7]): 



d t u x + b x (t,y)-Vu x + -<j x (t iy )a x (t,y) T :V 2 u x = f x (t,y) , 
u x (T,y)=g x (y), 



(2-7) 



where the notation Vw A 



means V y u (t,y) and a {t,y)a (t,y) < :V ' 2 u means 
E^ |fc =i^(*.y)otfc(*.»)^ <I y J « A (*.»)- Using Ito formula for (u x (t,X x ) 7 t e [0,T]) 
with u x solution to ( |2.7| , we get the following integral representation of Z x (see also 
Appendix |A] for another link between the SDE ( 1.2 1 and the PDE (2.7l, potentially 
useful to numerics): 



g x (X x )- f T f x (s,X x )d S = u x (0,x)+ [ T Vu x ( S ,X x )-<j x (s,X x )dB s . (21 
Jo Jo 



Note that the left-hand side of (2.81 is Z x , and the right-hand side is the sum of a 
stochatic integral (with zero mean) plus a scalar u x (0,x) (thus equal to the expected 
value E(Z A ) of the left-hand side). Hence, the optimal control variate also writes: 

r-T 



Y X = Z X -E(Z X )= I Vu x (s,X x )-a x (s,X x )dB s 
Jo 



(2.9) 



Of course, the formula (2.9 1 is again idealistic because, most often, numerically 



solving the PDE (2.7 1 is a very difficult task (especially in large dimension d>A). 
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2.3. Outline of the algorithms 



Considering either (2.6 1 or (2.9 1, we propose two algorithms for the efficient online 
computation of the family of parametrized outputs {E(Z A ) ,A€ A}, when the param- 
eter A can take any value in a given range A, using (for each A £ A) a control variate 
built as a linear combination of objects precomputed offline. 

More precisely, in Algorithm 1, we do the following: 
• Compute offline an accurate approximation Y x of Y x using (2.61, for a small 



set of selected parameters A € {Ai, . . . , A/} C A (where JsN>o). 

• For any A € A, compute online a control variate for the Monte-Carlo estima- 
tion of E (Z x ) as a linear combination of {Y Xi ,i = 1, ...,/} : 

i=i 

And in Algorithm 2, we do the following: 

• Compute offline an accurate approximation u x of the solution u x to the 



Kolmogorov backward equation (2.7| for a small set of selected parameters 
Ae{Ai,...,A/}cA. 

• For any A € A, compute online a control variate for the Monte-Carlo compu- 
tation of E(Z A ), in view of ( |2.9| , as a linear combination of J Q T Vu Ai (s, X x ) ■ 
cr x (s,X x )dB s (where i = l,...,I): 

*? = X>« / Vu x *(s,X x )-a x (s,X x )dB s . (2.10) 
i=i J ° 

For a fixed size / of the reduced-basis, being given a parameter A, both algorithms 
compute the coefficients [i x , i = 1, . . . , /, with a view to minimizing the variance of the 
random variable Z x — Y x (in practice, the empirical variance VarM(Z A — Y x )). 

For the moment being, we do not make further precise how we choose the set of 
parameters {Ai,...,A/} offline. This will be done by the same greedy procedure for 
both algorithms, and will be the subject of the next section. Nevertheless, we would 
now like to make more precise how we build offline: 

- in Algorithm 1, approximations {Y Xi ,i — 1, . . . ,/} for {Y Xi ,i = 1, . . . ,/}, and 

- in Algorithm 2, approximations {Vu Ai ,i — 1, . . . ,/} for {Vit Ai ,i = 1, . . . ,/}, 
assuming the parameters {Aj,i = l,...,J} have been selected. 

For Algorithm 1, Y Xi is built using the fact that it is possible to compute offline 
accurate Monte-Carlo approximations Ejvf(Z Ai ) of E(Z Ai ) using a very large number 
AI = Afiaigo of copies of Z Xi , mutually independent and also independant of the copies 
of Z x used for the online Monte-Carlo estimation of E (Z x ) , A ^ Xi (remember that the 
amount of offline computations is not meaningful in the case of a very large number 
of outputs to be computed online). The quantities FiM laisc (Z Xi ) are just real numbers 
that can be easily stored in memory at the end of the offline stage for re-use online 
to approximate the control variate Y Xi = Z Xi — E(Z Ai ) through: 

Y x > = Z x >-V Ml (Z x <). (2.11) 



For Algorithm 2, we compute approximations u Xi as numerical solutions to the 
Kolmogorov backward equation (2.7 1. For example, in the numerical results of Sec- 
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tion[4] the PDE (2.7) is solved numerically with classical deterministic discretization 
methods (like finite differences in the calibration problem for instance). 

Remark 2.2 (Algorithm 2 for stochastic processes with large dimension d). Most 
deterministic methods to solve a PDE (like the finite difference or finite elements 
methods) remain suitable only for d<3. Beyond, one can for example resort to prob- 
abilistic discretizations: namely, a Feynman-Kac representation of the PDE solution, 
whose efficiency at effectively reducing the variance has already been shown in [21]. 
We present this alternative probabilistic approximation in Appendix^^ but we will not 
use it in the present numerical investigation. 

One crucial remark is that for both algorithms, in the online Monte-Carlo com- 
putations, the Brownian motions which are used to build the control variate (namely 
Z Xi in (2.111 for Algorithm 1, and the Brownian motion entering Y x in (2.10) for 
Algorithm 2) are the same as those used for Z x . 

Note last that, neglecting the approximation errors Y Xi — Y Xi and u Xi — u Xi in the 
reduced-basis elements computed offline, a comparison between Algorithms 1 and 2 
is possible. Indeed, remembering the integral representation: 

Y x * = [ Vu x *( S ,X x >)-o- x >(s,X x >)dB s , 
Jo 

we see that the reduced-basis approximation of Algorithm 1 has the form: 



1 r 1 

„-_i Jo 



Vu x *( S ,X x *)-a x >( S ,X x *)dB s , 



while the reduced-basis approximation of Algorithm 2 has the form: 

= / Vu x *(s,X x )-a x ( S ,X x )dB s . 

i=i J o 

The residual variances Var (Y x — Y x ) for Algorithms 1 and 2 then respectively read 



as: 



E 



and: 



Vu x ■ cr x (s,X x ) -J2^Vu x > ■ a x ' (s,X Xi ) 



J2rfVu x >j-* x ( S ,X x ) )ds. 



ds, 



(2.12) 




(2.13) 



The formulas (2.12) and (2.13) suggest that Algorithm 2 might be more robust than 



Algorithm 1 with respect to variations of A. This will be illustrated by some numerical 
results in Section 0J 

3. Practical variance reduction with approximate control variates 

Let us now detail how to select parameters {A^ € A,i = 1, . . . , J} offline inside a large 
a priori chosen trial sample A tria i C A of finite size, and how to effe ctively compute 

for details 



the coefficients (/^)i=i,...,i in the linear combinations Y x (see Section 
about practical choices of A tr i a i C A) . 
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3.3.2 



Offline: select parameters {Xi € A tr iai , i = 1, ...,/} in A tr i a i C A a large finite sample. 
Selection under stopping criterium: maximal residual variance < s. 
Let Ai € A be already chosen, 

Compute accurate approximation EM largo (^ Al ) of E(Z Xl ). 
Greedy procedure: 

For step i=l,...,I-l (J>1): 



For all A G A tr iai, compute Y x as (3.2 1 and (cheap) estimations: 
e< (A) := Vax MmaU - *j X ) for Var (z A - K A ) . 

Select Aj + i 6 argmax {ej(A)}, 

AeA tria i\{A 3 j i >) 

If stopping criterium £j(Aj_|_i) <e, Then Exit Offline. 

Compute accurate approximation EM largc (Z Ai+1 ) of E(Z Ai+1 ). 
Fig. 3.1. Offline stage for Algorithm 1: greedy procedure in metalanguage 



3.1. Algorithm 1 

Recall that some control variates Y x are approximated offline with a computa- 
tionally expensive Monte-Carlo estimator using Mi argc ^ 1 independent copies of Z x : 



Y X = Z X -E M ^JZ X )^Y X , (3.1) 

for only a few parameters {Xi,i — 1, ...,/} C Atrial to be selected. The approximations 
Y Xi are then used online to span a linear approximation space for the set of all control 
variates {Y x , A G A}, thus linear combined as Y x . For any i = l,...,I, we denote by Y x 
(for any A € A) the reduced-basis approximation of Y x built as a linear combination 
of the first i selected random variables {Y Xj ,j 



Y t x = Y,^ x ^Y x , (3.2) 

where (/i A )j=i,...,i € M l is a vector of coefficients to be computed for each A (and each 
step i, but we omit to explicitly denote the dependence of each entry [i x , j — on 
i). The computation of the coefficients (/U A )j=i,...,j follows the same procedure offline 
(for each step i — 1) during the reduced-basis construction as online (when 

i = T): it is based on a variance minimization principle (see details in Section 3.1.2| . 



With a view to computing E(Z A ) online through computationally cheap Monte- 
Carlo estimations Fim sius .h(Z x ~Y X ) using only a few M sma u realizations for all AG A, 
we now explain how to select offline a subset {Xi, i— 1, ...,/} C Atrial m order to min- 
imize Var^Z A — Y x ^j (or at least estimators for the corresponding statistical error). 

3.1.1. Offline stage : parameter selection 

The parameters {Aj, i — !,...,!} are selected incrementally inside the trial sample 



Atrial following a greedy procedure (see Fig. 3.1 1. The incremental search between 
steps i and i + 1 reads as follows. Assume that control variates {Y Xj ,j = l,...,i} have 



already been selected at the step i of the reduced basis construction (see Remark 3.4 



for the choice of Y Xl ). Then, Y Xi+1 is chosen following the principle of controlling the 



maximal residual variance inside the trial sample after the variance reduction using 
the first i selected random variables: 



A l+ i £ argmax Var (z x - fA , (3.3) 

AGA tri al\{Aj ,j = l,...,i} V ' 



where the coefficients (pj)j=i i entering the linear combinations Y x in (3.2 I are 



computed, at each step j, like for Y x in the online stage (see Section 3.1.21. 



In practice, the variance in (3.31 is estimated by an empirical variance : 



Var (z x - Y x ) ~ Vax MsmaU (Z x - Y x ) . 



In our numerical experiments, we use the same number M sma n of realizations for the 
offline computations (for all A £ Atrial) as for the online computations, even though 
this is not necessary. Note that choosing a small number M sma n of realizations for the 
offline computations is advantageous because the computational cost of the Monte- 
Carlo estimations in the greedy procedure is then cheap. This is useful since Atrial ls 
very large, and at each step i, VaXM smaU (Z x — Y x ) has to be computed for all A £ Atrial- 
Remarkably, after each (offline) step % of the greedy procedure and for the next 
online stage when i = I, only a few real numbers should be stored in memory, namely 
the collection {E>M\„ e (Z j ), j — 1, ■ ■ ■ ,i} along with the corresponding parameters 



{Xj,j = l,...,i} for the computation of the approximations (3.1l. 

Remark 3.1. Another natural criterium for the parameter selection in the greedy 
procedure could be the maximal residual variance relatively to the output expectation 



max ,\ ' > ~ max M -" ( ) ■ (3.4) 

AGA trmI \E{Z X ) 2 AeA trlal E MHmall (Z x 2 



This is particularly relevant if the magnitude of the output E (Z x ^j is much more 
sensitive than that of Var (-Z A ) to the variations on A. And it also proved useful for 
comparison and discrimination between Algorithms 1 and 2 in the calibration of a 



local parametrized volatility for the Black-Scholes equation (see Fig. 4-5). 



3.1.2. Online stage : reduced-basis approximation 

To compute the coefficients (fJ, x )j=i,...,i in the linear combinations (3.2 1, both 



online for any A £ A when i — I and offline for each A £ A tr ; a i and each step i (see greedy 
procedure above), we solve a small-dimensional least squares problem corresponding 
to the minimization of (estimators for) the variance of the random variable Z x — Y x . 

More precisely, in the case i = I (online stage) for instance, the /-dimensional 
vector /i A = (n X )i<i<i is defined, for any \ £ A, as the unique global minimizer of the 
following strictly convex problem of variance minimization: 



argmin Var ( Z x - VV^ 1 J , (3.5) 

Mi)l<,</6K J \ i=1 / 



or equivalently as the unique solution to the following linear system : 
i 

^Cov(y Al ;y A ^ Mj A = Cov(y Al ;Z A ) ,Vi = l,...,J. (3.6) 
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Of course, in practice, we use the estimator (for X,Y £Lp(£l) and MgN>o) 



1 M ( 1 M \ ( 1 M 

m— 1 \ m — 1 / \ m— 1 



Cov M (x ; y). ... 

m— 1 \ m — 1 / \ m— 1 / 

to evaluate the statistical quantities above. That is, defining a matrix C Msma11 with 
entries the following empirical Monte-Carlo estimators & {1, ...,/}) : 



and a vector b Msl " a " with entries (i £{!,..., I}) bf Bma " = Cov MsmaU (y x ';Z x ^ , the lin- 
ear combinations (3.2 1 are computed using as coefficients the Monte-Carlo estimators 
which are entries of the following vector of M. 1 : 

M M small = [ G M small ] -l b M omalI _ (3 _ ?) 

The cost of one online computation for one parameter A ranges as the computation 
of M sma n (independent) realizations of the random variables (Z x ,Y Xl ,...,Y Xl ), plus 
the Monte-Carlo estimators Eju, mall , CovM amall , VarM amall and the computation of the 



solution ^i M sm»" to the (small /-dimensional, but full) linear system (3.7| 



In practice, one should be careful when computing (3-7 1, because the likely quasi- 
colinearity of some reduced-basis elements often induces ill-conditionning of the matrix 
C M,m '". Thus the QR or SVD algorithms [8] should be preferred to a direct inversion 



of (3.6) with the Gaussian elimination or the Cholevsky decomposition. One impor- 
tant remark is that, once the reduced basis is built, the same (small /-dimensional) 
covariance matrix C Mama11 has to be inverted for all A £ A, as soon as the same Brow- 
nian paths are used for each online evaluation. And the latter condition is easily 
satisfied in practice, simply by resetting the seed of the random number generator to 
the same value for each new online evaluation (that is for each new A G A). 

Remark 3.2 (Final output approximations and bounds). It is a classical result 
that, taking first the limit Mi ar „ c — >oo then M sma u— »oo, ^i M smaii _a : s. ^\ ^ 

M Bmall ,M largc ^oo 

So, the variance is indeed (asymptotically) reduced to the minimum Var(Z A — Y^j 
in (3.5), obtained with the optimal linear combination Y x of selected control variates 



Y Xi (without approximation). In addition, using Slutsky theorem twice successively for 
Monte-Carlo estimators of the coefficient vector fi x and of the variance Var \ Z X — Y X ) , 
it also holds a computable version of the Central Limit Theorem, which is similar 
to (2.4) except that it uses Monte-Carlo estimations of Z x — Y x instead of Z x to 



compute the confidence intervals ( and with successive limits Mi arge — > oo, M sma n — > oo). 
So our output approximations now read for all AgA; 



and asymptotic probabilistic error bounds are given by the confidence intervals (2.4 1. 
3.2. Algorithm 2 



In Algorithm 2, approximations Vu Ai of the gradients Vit Ai of the solutions *• x 



to the backward Kolmogorov equation (2.7 1 are computed offline for only a few pa- 



rameters {\i,i= 1, ...,/} C Atrial to be selected. In comparison with Algorithm 1, ap- 
proximations (Vu i )i=i 1 ...,/ are now used online to span a linear approximation space 
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Offline: select parameters {Xi € A tr iai , i = 1, ...,/} in A tr i a i C A a large finite sample. 
Selection under stopping criterium: maximal residual variance < s. 

Let Ai € A be already chosen, 

Compute approximation Vu Al of Vu Al . 
Greedy procedure: 

For step i=l,...,I-l (J>1): 



For all As A tr iai, compute as (3.8) and estimations: 

64 (A) := Vai MmaU - for Var (V - K A ) . 
Select Aj + i 6 argmax {ej(A)}, 

AeA tria i\{A 3 J I <l 

If stopping criterium £j(Aj_|_i) <e, Then Exit Offline. 
Compute approximation Vw Ai+1 of Vu Ai+1 . 

Fig. 3.2. Offline stage for Algorithm 2: greedy procedure in metalanguage 



for {Vit A , A 6 A}. At step i of the greedy procedure (i = l,...,I), the reduced-basis 
approximations for the control variates Y x read (for all A £ A) : 

i 

^ A = £/^'«^\ (3.8) 

y A Aj - / V^(s,X s A )-a A ( S) X s A )dB s . (3.9) 
Jo 

where (/i A ) J=1 j are coefficients to be computed for each A (again, the dependence 
of /i A on the step i is implicit). Again, the point is to explain, first, how to select pa- 
rameters {\i,i=l,...,I}c Atrial m the offline stage, and second, how to compute the 
coefficients (^ A )j=i,...,i in each of the i-dimensional linear combinations Y x . Similarly 
to Algorithm 1, the parameters {A^, i — 1, ...,/} C Atrial are selected offline following 
the greedy procedure, and, for any i=l,...,I, the coefficients in the lin- 

ear combinations offline and online are computed, following the same principle of 
minimizing the variance, by solving a least squares problem. 

3.2.1. Offline stage : parameter selection 

The selection of parameters {Xj,j — l,...,i} from a trial sample A tr iai follows a 
greedy procedure like in Algorithm 1 (see Fig. |3.2[ |. In comparison with Algorithm 1, 
after i (offline) steps of the greedy procedure (1 <i <I— 1) and online (i = I), note that 
discretizations of functions (t,y) — >Vm Aj (t,y), j = l,...,i + l, are stored in memory to 



compute the stochastic integrals (3.8 1, which is possibly a huge amount of data. 



3.2.2. Online stage : reduced-basis approximation 

Like in Algorithm 1, the coefficients (/c)j=i,...,t m the linear combination (3.8 1 



are computed similarly online (and then i = I) for any A € A and offline (when 1 < i < 
1—1) for each A € A tr iai as minimizers of - a Monte Carlo discretization of - the least 
squares problem: 

minVar ^Z A -^^f As ^ , (3.10) 
11 



where we recall that Y x ' are defined by (3.9 1. Note that contrary to the reduced-basis 
elements Y Xi in Algorithm 1, the elements Y x l in Algorithm 2 have to be recomputed 
for each queried parameter value A G A. 

Again, in practice, the unique solution (/^ ) J -=i 1 ... l i to the variational prob- 



lem (3.101 is equivalently the unique solution to the following linear system: 

Cov ; Y^ ) $ = Cov (y A A * ; Z A ) , V* = 1 , . . . , I , (3.11) 

3=1 

and is in fact computed as the unique solution to the discrete minimization problem: 

^M 5mall = [ C M smaU j -l b M smalI ^ (312) 

with C^-" = Cov Mamall (y?;Y?) and bf-» = Cov MamalI (y x ';Z x ) • 

The cost of one computation online for one parameter A is more expensive than 
that in Algorithm 1, and ranges as the computation of M sma ii independent realiza- 
tions of Z x , plus the computation of / (discrete approximations of) the stochastic 
integrals ( |3.9) , plus the Monte-Carlo estimators and the solution ^ M *™=iii to the (small 
/-dimensional, but full) linear system ( 3.12| . In comparison to Algorithm 1, notice 



that the (discrete) covariance matrix Q ra ° ma11 to be inverted depends on A, and thus 
cannot be treated offline once for all: it has to be recomputed for each A £ A. 

3.3. General remarks about reduced-basis approaches 

The success of our two reduced-basis approaches clearly depends on the variations 
of Z x with As A. Unfortunately, we do not have yet a precise understanding of 
this, similarly to the PDE case [23]. Our reduced-basis approaches have only been 
investigated numerically in relevant cases for application (see Section |3J. So we now 
provide some theoretical ground only for the a priori existence of a reduced basis, like 
in the PDE case [18], with tips for a practical use of the greedy selection procedure 
based on our numerical experience. Of course, it remains to show that the greedy 
procedure actually selects a good reduced basis. 

3.3.1. A priori existence of a reduced basis 

Following the analyses [18, 23 for parametrized PDEs, we can prove the a priori 
existence of a reduced basis for some particular collections of parametrized control 
variates, under very restrictive assumptions on the structure of the parametrization. 
Proposition 3.3. Assume there exist collections of uncorrelated (parameter- 
independent) random variables with zero mean Yj£Lp(fl), l<j< J, and of positive 
C°°(K) functions gj, l<j<J, such that 

J 

i=i 

and there exists a constant C > such that, for all parameter ranges A = [A m j n , A max ] C 
R, there exists a C°° diffeomorphism t\ defined on A satisfying: 

sup sup {g j oTl 1 ) {M \\)<M\C M , for all M-derivatives of gjor^ 1 . (3.14) 

l<J<-'Aer A (A) 

Then, for all parameter ranges A = [A m i„,A max ]cR, there exist constants 

Ci,C2>0 independent of A and J such that, for all JVgN>o, 7V>A^:=1 + 
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Ci (TA(A max ) — TA(A m i n )), there exist N distinct parameter values X 1 ^ GA, n- 
1,...,N, (with X^ <A^ +1 ), sastisfying, with = Span (y a ™ , n — 1 , . . . , N^j : 



inf Yax(Z x -Y N ) < e"^ 1 ^ 1 ) Var (Z A ) , VAeA. (3.15) 



One can always write F like (3.131 with uncorrelated random variables (using a 



Gram-Schmidt procedure) and with positive coefficients (at least on a range A where 



they do not vanish). But the assumption (3.14 1 is much more restrictive. The mapping 



ta for the parameter, which depends on the functions gj, j = l,...,J, indeed tells us 
how the convergence depends on variations in the size of the parameter range A. 
See |18l [23] for an example of such functions gj and ta, and Appendix [B] for a short 
proof inspired from |18l [23] . 

The Proposition|3.3|may cover a few interesting cases of application for the a priori 



existence theory. One example where the assumption (3.13) hold is the following. 
Consider an output Z x = g(X^) with g a polynomial function, and : 



t ft 

A/ „\ v-A j„ i / _A 



X x = x + / b x (s)X x ds + / a x (s)dB s . (3.16) 



The optimal control variate Y x in such a case writes in the form ( 3.13| (to see this, 



one can first explicitly compute the reiterated (or multiple) Ito integrals in the poly- 
nomial expression of g(X%) with Hermite polynomials [12]). Then, ( |3.14[ | may hold 



provided b x and cr x are smooth functions of A £ A (again, see |18l 123] for functions 



gj satisfying ( 3.14 1 ) . But quite often, the reduced bases selected in practice by the 



greedy procedure are much better than 3-V (see [23] for comparisons when A is scalar). 

3.3.2. Requirements for efficient practical greedy selections 

A comprehensive study would clearly need hypotheses about the regularity of Y x 
as a function of A and about the discretization A tr j a i of A to show that the greedy 
procedure actually selects good reduced bases. We do not have precise results yet, 
but we would nevertheless like to provide the reader with conjectured requirements 
for the greedy procedure to work and help him as a potential user of our method. 

Ideally, one would use the greedy selection procedure directly on {Y x ,XgA} for 
Algorithm 1 and on {Vu A ,Ae A} for Algorithm 2. But in pratice, one has to resort to 
approximations only, {F A ,AgA} for Algorithm 1 and {Vu A ,AeA} for Algorithm 2. 
So, following requirements on discretizations of parametrized PDEs in the classical 
reduced-basis method [23 , the stability of the reduced basis selected by the greedy 
procedure for parametrized control variates intuitively requires: 

(HI) For any required accuracy e > 0, we assume the existence of approximations, 
Y x for Y x in Algorithm 1 (resp. u x for u x in Algorithm 2), such that the 
L 2 -approximation error is uniformly bounded on A: 



VAeA,E(jy A -r A | 2 ) <s, 

^resp. E(\\7u x -\7u x \ 2 (X x ))dt<e or || Vm a - Vw A ||^ 2 <e^j . 

Moreover, in practice, one can only manipulate finite nested samples of parameter 
Atrial instead of the full range A. So some representativity assumption about A tr i a i is 
also intuitively required for the greedy selection procedure to work on A: 
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(H2) For any required accuracy e > 0, we assume the existence of a sufficiently 
representative finite discrete subset A tria i C A of parameters such that reduced 
bases built from Atrial ore still good enough for A. 
Refering to Section |3.3.1[ good enough reduced bases should satisfy exponential con- 



vergence like (3.15), with slowly deteriorating capabilities in terms of approximation 
when the size of the parameter range grows. Now, in absence of more precise result, 
intuitition has been necessary so far to choose good discretizations. The numerical 
results of Section [4] have been obtained with Mi arge = 100 M sm ali in Algorithm 1, and 
with a trial sample Atrial of 100 parameter values randomly chosen (with uniform 
distribution) in A. 

In absence of theory for the greedy procedure, one could also think of using an- 
other parameter selection procedure in the offline stage. The interest of the greedy 
procedure is that it is cheap while effective in practice. In comparison, another natural 
reduced basis would be defined by the first / leading eigenvectors from the Princi- 
pal Components Analysis (PCA) of the very large covariance matrix with entries 
Cov (Y Xi ;Y Xj ) ,, ... . . The latter (known as the Proper Orthogonal De- 
composition method) may yield similar variance reduction for most parameter values 
Ae A [23J, but would certainly require more computations during the offline stage. 
Remark 3.4. The choice of the first selected parameter Ai has not been precised 
yet. It is observed that most often, this choice does not impact the quality of the 
variance reduction. But to be more precise, we choose Ai € A sm aii trial such that Z Xl 
has maximal variance in a small initial sample A sma n trial C A, for instance. 

4. Worked examples and numerical tests 

The efficiency of our reduced-basis strategies for parametrized problems is now 
investigated numerically for two problems relevant to some applications. 

Remark 4.1 (High-dimensional parameter). Although the maximal dimension 
in the parameter treated here is two, one can reasonably hope for our reduced-basis 
approach to remain feasible with moderately high-dimensions in the parameter range 
A, say twenty. Indeed, a careful mapping of a multi-dimensional parameter range 
may allow for an efficient sampling Atrial that makes a greedy procedure tractable and 
next yields a good reduced basis for A, as it was shown for the classical reduced-basis 
method with parametrized PDEs f25l 

4.1. Scalar process with constant drift and parametrized diffusion 

4.1.1. Calibration of the Black— Scholes model with local volatility 

One typical computational problem in finance is the valuation of an option de- 
pending on a risky asset with value St at time t£ [0,T]. In the following we consider 
Vanilla European Call options with payoff c/)(St',K) =max(S , T — K,0), K being the 
exercise price (or strike) of the option at time t — T. By the no arbitrage principle for 
a portfolio mixing the risky asset of value S t with a riskless asset of interest rate r(t), 
the price (as a function of time) is a martingale given by a conditional expectation: 

e-^ r(s)ds E(^ T )|^ t ) (4.1) 

where, in the Black-Scholes model with local volatility, St = S* is a stochastic process 
solving the Black-Scholes equation: 

dS? = S?(r(t)dt + <j x (t,S?)dB t ) S^ = S Q , (4.2) 
14 



and (Tt) is the natural filtration for the standard Brownian motion (B t ). For this 
model to be predictive the parameter A in the (local) volatility a x needs to be cali- 
brated against observed data. 

Calibration, like many numerical optimization procedures, defines a typical many- 
query context, where one has to compute many times the price (4.1 1 of the option 
for a large number of parameter values until, for some optimal parameter value A, 
a test of adequation with statistical data P(K,ti) observed on the market at times 
tj € [0,T] , I = 0, . . . ,L is satisfied. For instance, a common way to proceed is to minimize 
in A the quadratic quantity: 

J(A) = ^| e -^ r(s)ds E(0(^;X)|^)-^)' 2 

most often regularized with some Tychonoff functional, using optimization algorithms 
like descent methods which indeed require many evaluations of the functional J{\) 
for various A. One could even consider the couple (K,T) as additional parameters to 
optimize the contract, but we do not consider such an extension here. 

Note that the reduced-basis method for parameterized PDEs |17l H8l [23] has 
recently proved very efficient at treating a similar calibration problem j24] . Our 
approach is different since we consider a probabilistic pricing numerical method. 

In the following numerical results, we solve (4.1 1 for many parameter values as- 
suming that the interest rate r is a fixed given constant and the local volatility a x 



has "hyperbolic" parametrization (|4.3[l (used by practitionners in finance): 



1 T 



^=^[c^) + cm ] (4 - 3) 



where 



C(t, S) = i (yC A (t, 5) 2 + + C A (t, S)) with: 



C A (t,S) = a+ 1 -J (b-c^log 2 ( — ^-)+Aa^ + hb + c)log( ' S 



2y \aSoe rt J 2 \aSoe 

The local volatility a x is thus parametrized with a 7-dimensional parameter A = 
(a,6,c,d,a,r,C min ). 

Our reduced-basis approach aims at building a vector space in order to approxi- 
mate the family of random variables: 

{Y x := e-' T max(S£ -K,0)- er rT E (m&x(S^ - K,0)) , A € A} , 

which are optimal control variates for the computation of the expectation of 
e" rT max(S'y — K,0). In Algorithm 2, we also use the fact that 

Y x = / d s u x (t,S x )a x (t,S x )S x dB tl (4.4) 
Jo 

where the function u x (t,S) solves for (t,S) € [0,T) x (0,oo): 

d t u x (t, S) + rSd s u x (t, S) + aX{t ^ )2S2 d SS u X (t, S) = , (4.5) 
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with final condition u x (T,S) = e rT max(5 — A",0). Note the absence of boundary con- 
dition at 5 = because the advection and diffusion terms are zero at S = 0. The back- 



ward Kolmogorov equation (4.51 is numerically solve using finite differences [1J. More 
precisely, after a change of variable u x (t,S) = e~ rt C x (t,S), equation (|4.4| rewrites: 



Y x = [ T e- rt d s C x (t,S x )a x (t,S x )S x dB t , (4.6) 
Jo 

where C x (t,S) solves the classical Black-Scholes PDE: 

d t C x (t,S) -rC x {t,S) + rSd s C x (t,S) + °^^-^- d S sC X (t,S)=0, (4.7) 

with the final condition C X (T, S) = max(S — K, 0). In the case of a low-dimensional 
variable St (like one-dimensional here), one can use a finite differences method 
of order 2 (with Crank-Nicholson discretization in time) to compute approxima- 
tions C X j ~C x (ti,Xj), l — 0,...,L, j — 0,...,J on a grid for the truncated domain 
[0,T] x [Q,3K] C [0,T] x [0,oo), with L = 100 steps in time and J = 300 steps in space of 
constant sizes (and with Dirichlet boundary condition C x J+l = (3 — e~ r ( T ~ tl >)K ,Vl = 
0,...,N at the truncated boundary). An approximation C x (t,S) of C x (t,S) at 
any (t,S) € [0,T] x [0,3iT] is readily reconstructed as a linear interpolation on tiles 
(t,S)€[ti,t l+1 ]x[S j ,S j+1 ]. 



4.1.2. Numerical results 

The Euler-Maruyama scheme with N= 10 2 time steps of constant size At = S 



10 2 is used to compute one realization of a pay-off max(iS^ — K,Q), for a strike 
K = 100 at final time tjy=T=l when the initial price is Sq — 90 and the interest rate 

r = 0.04. Then, (a large number of) expectations E are approxi- 

mated through Monte-Carlo evaluations ^M smaU ^max(S'^, — if,0)^ with M sma u = 10 3 

realizations, when the local volatility parameter A= (a,6,c,d,a,r,C m i n ) assumes many 
values in the two-dimensional range A= [— .05, .15] x {b — c£ [.5,1.5]} x {1.} xjl.l} x 



{5}x{.05} (variations of the function a (t,S) with A are shown in Fig. 4.1 1. We 



build reduced bases of different sizes I = 1, . . . , 20 from the same sample A tr i a i of size 



| Atrial | = 100, either with Algorithm 1 (Fig. 4.2 and 4.4 1 using approximate control 



variates computed with Mi argc = 100M sma n evaluations : 

= E ^ maU fXz = E ^ sma " ( max (^ 1 -K,Q)- £ Mlargc (max(^ - K, 0) 



or with Algorithm 2 (Fig. 4.3 and 4.4 1 using approximate control variates: 



^ A = E^ sma11 E e~ rtn dsC Xi (t n ,S x )a\t n ,S x ) y/\ t n+l -t n \G r , 



computed as first-order discretizations of the Ito stochastic integral (4.6 1 using the 
finite-difference approximation of the solution to the backward Kolmogorov equation. 
We always start the greedy selection procedure by choosing Ai such that Y Xl has 
the maximal correlation with other members in A sma n trial; a small prior sample of 10 
parameter values chosen randomly with uniform law in A, see Remark |3.4| 
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Fig. 4.1. Variations of the "hyperbolic" local volatility function o-*(t,S) with re- 
spect to 5e [50,150]. Six families of curves are shown (as time t evolves in [0,1]) 
for extremal and mid- values of the parameter (a,b = c) in [— .05, .15] x {f> = cG [.5,1.5]}: 
(min(a),min(fe = c)), (min(a),max(6 = c)), (med(a) := .5min(a) + .5max(a),min(b = c)), (mcd(a) := 
.5min(a) + .5max(a),max(fo = c)), (max(a),min(6 = c)), (max(a),max(fe = c)). Each family of curves 
shows the time variations of S ^ a (t,S) for t6 {.1 X k\k = 0,...,l0}). 



We show in Fig. |4.2| and |4.3| the absolute variance after variance reduction : 

Var Msmall (ma(4 -K,0)- f/) , (4.S 



(4.9) 



and in Fig. |4.4| the relative variance after variance reduction : 

Var M5mall (max(^ -K,Q)-Yf 

E Msmall (max(S* - K, 0) - f/) 

In each figure, the maximum, the minimum and the mean of one of the two residual 
variance above is shown, either within the offline sample deprived of the selected 
parameter values A tria i\{Aj,i = 1,. ..,/}, or within an online uniformly distributed 
sample test A test cA of size |A test | = 10|A triaJ |. 

It seems that Algorithm 1 slighlty outperfoms Algorithm 2 with a sufficiently large 
reduced basis, comparing the (online) decrease rates for either the relative variance or 
the absolute variance. Yet, one should also notice that, with very small-dimensional 
reduced basis, the Algorithm 2 yields very rapidly good variance reduction. Compar- 
ing the decrease rates of the variance in offline and online samples tells us how good 
was the (randomly uniformly distributed here) choice of Atrial- The Algorithm 2 seems 
more robust than the Algorithm 1 for reproducing ("extrapolating") offline results from 
a sample Atrial m the whole range A. So, comparing the first results for Algorithms 1 
and 2, it is not clear which algorithm performs the best variance reduction for a given 
size of the reduced basis. 



Now, in Fig. 4.5 and 4.6 we show the online (absolute and relative) variance for a 
new sample test of parameters A tG stwide uniformly distributed in A w id c = [—.15,. 25] x 

17 





W 2 4 6 8 10 12 14 16 18 w 2 4 6 8 10 12 14 16 18 20 



Fig. 4.2. Algorithm 1 for Black-Scholes mo del w ith local "hyperbolic" volatility: Minimum +, 
mean x and maximum o of the absolute variance ( |4.8) in samples of parameters (left: offline sample 
Atrial = It ■•• i^}; right: online sample A tcs t) with respect to the size I of the reduced basis. 
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Fig. 4.3. Algorithm 2 for Black-Scholes mo del w ith local "hyperbolic" volatility: Minimum +, 
mean x and maximum o of the absolute variance ( |4.8) in samples of parameters (left: offline sample 
Atrial \{-^i>* = lj ■•• >!}> right: online sample Atcst ) with respect to the size I of the reduced basis. 
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Fig. 4.4. Algorithm 1 (left) and 2 (right) for Black-Scholes model with local "hyperbolic" 
volatility: Minimum +, mean x and maximum o of the relative variance ( |4.9[ in a sample test 
(online) At es t of parameters with respect to the size I of the reduced basis. 
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{6 = ce]0,2[} x {1.} x {1.1} x {5} x {.05}, which is twice larger than A=[-.05,.15] x 
{i) = c£ [.5,1.5]} x {1.} x {1.1} x {5} x {.05} where the training sample A trial of the of- 
fline stage is nested : the quality of the variance reduction compared to that for a 
narrower sample test A tes t seems to decrease faster for Algorithm 1 than for Algo- 
rithm 2. So Algorithm 2 definitely seems more robust with respect to the variations 
in A than Algorithm 1. This observation is even further increased if we use the rela- 



tive variance < \4.9\ instead of the absolute variance (4.8), as shown by the results in 
Fig. [43] and [4^| 
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Fig. 4.5. Algorithm 1 (left) and 2 (right) for Black-Scholes model with l ocal "hyperbolic" 
volatility: Minimum +, mean x and maximum o of the (online) absolute variance ( |4.8| ) in a sample 
test A testw i(j e of p aram eters with respect to the siz e I of the reduced basis. Greedy selection with 
absolute variance (|4. 8| (top) and relative variance (|4.9[ (bottom). 



4.2. Vector processes with constant diffusion and parametrized drift 

4.2.1. Molecular simulation of dumbbells in polymeric fluids 

In rheology of polymeric viscoelastic fluids, the long polymer molecules responsible 
for the viscoelastic behaviour can be modelled through kinetic theories of statistical 
physics as Rouse chains, that is as chains of Brownian beads connected by springs. 
We concentrate on the most simple of those models, namely "dumbbells" (two beads 
connected by one spring) diluted in a Newtonian fluid. 

Kinetic models consist in adding to the usual velocity and pressure fields (u,p) 
describing the (macroscopic) state of the Newtonian solvent, a field of dumbbells 
represented by their end-to-end vector X t (x) at time t and position x in the fluid. 
Vector stochastic processes (X t (x)) encode the time evolution of the orientation and 
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Fig. 4.6. Algorithm 1 (left) and 2 (right) for Black-Scholes model with l ocal "hyperbolic" 
volatility: Minimum +, mean x and maximum o of the (online) relative variance ( |4.9| ) in a sample 
test A testw i(je of p aram eters with respect to the siz e I of the reduced basis. Greedy selection with 
absolute variance ( |4. 8| (top) and relative variance ( |4.9[ (bottom). 



the stretch of the dumbbells (the idealized configuration of a polymer molecule) for 
each position xdT> in a macroscopic domain V where the fluid flows. To compute 
the flow of a viscoelastic fluid with such multiscale dumbbell models |15] . segregated 
algorithms are used that iteratively, on successive time steps with duration T: 

• first evolve the velocity and pressure fields (it,p)of the Newtonian solvent 
under a fixed extra (polymeric) stress tensor field r(typically following Navier- 
Stokes'equations), and 

• then evolve the (probability distribution of the) polymer configurations vector 
field (X t (x)) surrounded by the newly computed fixed velocity field u. 

The physics of kinetic models is based on a scale separation between the polymer 
molecules and the surrounding Newtonian fluid solvent. On the one side, the polymer 
configurations are directly influenced by the (local) velocity and pressure of the New- 
tonian solvent in which they are diluted. Reciprocally, on the other side, one needs 
to compute at every xGT> the extra (polymeric) stress, given the Kramers formula: 

t(T,x)=E(X t (x)®F(X t (x))) , 

after one evolution step t£ [0,T] over which the polymer configurations have evolved 
(remember that here [0,T] should be understood as a timestep). The vector valued 
process X t (x) in R d (d=2 or 3) solves a Langevin equation at every physical point 
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xGT> (Eulerian description): 



dX t + u ■ X7 x X t dt = ((V £ ti) X t - F(X t )) dt + dB t . 

This Langevin equation describes the evolution of polymers at each x£T>, under an 
advection u-V x X t , a hydrodynamic force (V x u)X tl Brownian collisions (B t ) with 
the solvent molecules, and an entropic force F(X t ) specific to the polymer molecules. 
Typically, this entropic force reads either F(X) = X (for Hookean dumbbells), or 
F(X) = 1 _|x|2/b (f° r Finitely-Extensible Nonlinear Elastic or FENE dumbells, to 

model the finite extensibility of polymers: \X \ < Vb). 

In the following, we do not consider the advection term u-V x X t (which can be 
handled through integration of the characteristics in a semi-Lagrangian framework, 
for instance), and we concentrate on solving the parametrized SDE: 

dX t =(±X t -F{X t ))dt + dB t , (4.10) 

on a time slab [0,T], with a fixed matrix A(x) = V x u(x). We also assume, as usual for 
viscoelastic fluids, that the velocity field is incompressible (that is tr(A)=0), hence 
the parameter A is only (d 2 — l)-dimensional. 



This is a typical many-query context where the Langevin equation (4.10 1 has to be 
computed many times at each (discretized) position xGT>, for each value of the dxd- 
dimensional parameter A (since V x u(x) depends on the position x). Furthermore, 

the computation of the time-evolution of the flow defines a very demanding many- 
query context where the latter has to be done iteratively over numerous time steps 
of duration T between which the tensor field X(x) is evolved through a macroscopic 

equation for the velocity field u. 

Remark 4.2 (Initial Condition of the SDE as additional parameter). Let T o = 
and T n+ i = {n-\-\)T. Segregated numerical schemes for kinetic models of polymeric 



fluids as described above simulate ( 4.10[ | on successive time slabs [T n ,T n+ i], forn&l 



More precisely, on each time slab [T n ,T n+ i], one has to compute 

r(T n+1 )=E(X Tn+1 ®F(X Tn+1 )) 

= E(E(X Tn+1 ®F(X Tn+1 )\X Tn )) (4.11) 



at a fixed position x£T>. In practice, (4.11 1 can be approximated through 



R M 

^)4EiE ® F ( xr c, ) > ( 4 - 12 ) 



M 

r—l m—L 

after simulating MR processes {X r t ,m ) t£ Yr n ,T n+ i} driven by MR independent Brownian 
motions for a given set of R different initial conditions, typically: 

X r T J = X r T \ , r=l,...,R, m=l,...,M, 
or any X r r ^ na (with 1 <rriQ<M) given by the computation at final time of the pre- 



vious time slab [T n —i,T n ]. In view of (4.121, for a fixed r, the computation of 
F El'=i -^t™ i ® ^(-^t™ i ) using the algorithms presented above requires a modi- 
fication of the methods to the case when the initial condition of the SDE assumes 
many values. 
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To adapt Algorithm 1 to the context of SDEs with many different initial con- 
ditions, one should consider reduced bases for control variates which depend on the 
joint-parameter (X,x) where x is the initial condition of the SDE. And variations on 
the joint-parameter (X,x) can be simply recast into the framework of SDEs with fixed 
initial condition used for presentation of Algorithm 1 after the change of variable 
X*> x = X?-x, that is using the family of SDEs with fixed initial condition X ' x = 0; 

dX X ' x = b X ' x (t,X Xx )dt + a X ' x {t,X Xx )dB t , (4.13) 

where b X ' x (t,X) = b x (t,X + x), a X ' x (t,X) = a x (t,X + x), for all t, X and x. Then, with 
g X ' x (X)—g x (X + x) and f x - x (t,X) = f x (t,X + x), the output is the expectation of 

Z X ' x =g x {X X - x )- C f x { S ,X x < x )ds. 



And the corresponding "ideal" control variate reads Y x,x = Z X ' X — E(Z X ' X 

In Algorithm 2, note that u x solution to (2.7) does not depend on the initial 



condition used for the SDE. So, once parameters \i (i = l,...,I) have been selected 
offline, Algorithm 2 applies similarly for SDEs with one fixed, or many different, 
initial conditions. Though, the offline selection of parameters A^ using SDEs with 
many different initial conditions should consider a larger trial sample than for one 
fixed initial condition. Indeed, the selection criterium in the greedy algorithm does 
depend on the initial condition of the SDE. So, defining a trial sample of initial 
conditions A\c, the following selection should be performed at step i in Fig. \3. 2\ 

Select Aj+i € argmax maxVarM U (Z X,X — Y x ' x ) , 

AeA tri ai\{A 3 j = l,... :4 } a;eA ic 

where Z x,x and Y X ' X , defined like Z x and Y x , depend on x because the stochastic 
process (X x ) depends on Xq=x. 

It might be useful to build different reduced bases, one for each cell of a partition 
of the set of the initial condition. In summary, both algorithms can be extended to 
SDEs with variable initial condition, at the price of increasing the dimension of the 



parameter (see also Remark J i .l). 

Remark 4.3 (Multi-dimensional output). Clearly, the full output r in the prob- 
lem described above is three-dimensional (it is a symmetric matrix). So our reduced- 
basis approach such as presented so far would need three different reduced bases, one 
for each scalar output. Though, one could alternatively consider the construction of 
only one reduced basis for the three outputs, which may be advantageous, see UV for 
one example of such a construction. 

Note that it is difficult to compute accurate approximations of the solution to 



the backward Kolmogorov equation (2.7 1 in the FENE case, because of the nonlinear 
explosive term. It is tractable in some situations, see |16| |6] for instance, though at 
the price of computational difficulties we did not want to deal with in this first work 
on our new variance reduction approach. On the contrary, the backward Kolmogorov 



equation (2.7 1 can be solved exactly in the case of Hookean dumbells. Hence we have 
approximated here u x in Algorithm 2 by the numerical solution u x to the backward 



Kolmogorov equation (2.7 1 for Hookean dumbells, whatever the type of dumbbells 
used for the molecular simulation (Hookean or FENE). 

We would like to mention the recent work [H] where the classical reduced-basis 
method for parameterized PDEs has been used in the FENE case (solving the FENE 
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Fokker-Planck by dedicated deterministic methods). Our approach is different since 
we consider a stochastic discretization. 



4.2.2. Numerical results 

The SDE ([OJ for FENE dumbbells (when d = 2) is discretized with the Euler- 
Maruyama scheme using iV=100 iterations with a constant time step of A£=1CP 2 
starting from a (deterministic) initial condition _X"o = (l,l), with reflecting boundary 
conditions at the boundary of the ball with radius \fb. 

The number of realizations used for the Monte-Carlo evaluations, and the sizes of 
the (offline) trial sample A tr i a i and (online) test sample A tost for the three-dimensional 
matrix parameter A with entries (An = — A22, A12, A21), are kept similar to the previous 
Section 4.1 Samples A tr iai and A tcs t for the parameter A are uniformly distributed in 

a cubic range A = [— 1,1] 3 . We will also make use of an enlarged (online) test sample 
Atestwido, uniformly distributed in the range [— 2,2] 3 . 

When 6 = 9, the variance reduction online with Algorithm 1 is again very interest- 
ing, of about 4 orders of magnitude with 1 — 20 basis functions, whatever the criterium 



used for the selection (we only show the absolute variance, in Fig. 4.7 1. But when 



6 = 4, the reflecting boundary conditions are more often active, and the maximum 



online variance reduction slightly degrades (see Fig. 4.8 1. 

We first tested our variance reduction with Algorithm 2 for Hookean dumbells 
and it appeared to work well; but such a model is considered too simple generally. 
Then using the solution to the Kolmogorov backward equation for Hookean dumbells 
as u x in Algorithm 2 for FENE dumbbells still yields good variance reduction while 



the boundary is not touched (see Fig. 4.9 1 ; when 6 = 4 and many reflections at the 



boundary occur, the variance is hardly reduced. Again Algorithm 2 seems to be 
slightly more robust than Algorithm 1 in terms of extrapolation, that is when the 



(online) test sample is "enlarged" (see Fig 4.10 with 6=16 and a sample test (online) 

Atestwidc) ■ 

5. Conclusion and perspectives 

We have demonstrated the feasibility of a reduced-basis approach to compute 
control variates for the expectation of functionals of a parameterized Ito stochastic 
process. We have also tested the efficiency of such an approach with two possible 
algorithms, in two simple test cases where either the drift or the diffusion of scalar 
(d=l), and vector (rf = 2), Ito processes are parametrized, using 2- or 3-dimensional 
parameters. 

Algorithm 2 is less generic than Algorithm 1 ; it is basically restricted to low- 
dimensional stochastic processes (X t ) since: 

• one needs to solve (possibly high-dimensional) PDEs (offline), and 

• discrete approximations of the PDEs solutions on a grid have to be kept in 
memory (which is possibly a huge amount of data). 

Yet, Algorithm 2 seems more robust to variations in the parameter. 

From a theoretical viewpoint, it remains to better understand the convergence 
of reduced-basis approximations for parametrized control variates depending on the 
parametrization (and on the dimension of the parameter in particular), on the 
reduced-basis construction (following a greedy procedure) and on an adequate dis- 
cretization choice (including the computation of approximate control variates and the 
choice of a trial sample A tr i a i) ■ 

Acknowledgement. We acknowledge financial support from the France Israel 
Teamwork in Sciences. We thank Raz Kupferman, Claude Le Bris, Yvon Maday 

23 




10 2 4 6 8 10 12 14 16 18 w 2 4 6 8 10 12 14 16 18 20 



Fig. 4.7. Algorith m 1 for FENE model with b = 9: Minimum +, mean x and maximum o 
of the absolute variance ( |4.8) in samples of parameters (left: offline sample A tr i a i \ {A^, i = 1, . . . ,1}; 
right: online sample A tcst ) with respect to the size I of the reduced basis. 
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Fig. 4.8. Algorith m 1 for FENE model with b = 4: Minimum +, mean x and maximum o 
of the absolute variance ( |4.8) in samples of parameters (left: offline sample At f j a i\{Aj,i = l, ..-,/}; 
right: online sample At cs tj with respect to the size I of the reduced basis. 
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Fig. 4.9. Algorith m 2 for FENE model with b = 9: Minimum +, mean x and maximum o 
of the absolute variance ( |4.8[ in samples of parameters (left: offline sample A tr iai \ {Ai,« = 1, . . . ,/}; 
right: online sample At cs t ) with respect to the size I of the reduced basis. 
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Fig. 4.10. Algorithm 1 (left) and 2 ( right ) for FENE model with 6 = 16: Minimum +, mean 
x and maximum o of the relative variance \4.9\ in online test for samples Atest (top) and A tcs twidc 
(bottom) of parameters, with respect to the size I of the reduced basis. 
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Appendix A. Algorithm 2 in a higher-dimensional setting (<i>4). 

The solution u x {t,y) to (2.7 1 can be computed at any (t,y) € [0,T] x R d by the 
martingale representation theorem |12j : 

g x (X x )-£ f x (s,X*)ds = u x (t,X*) + J Vu x (s,X x )-a x ( S ,X x )dB s: (6.1) 



obtained by an It 6 formula similar to (2.8 1. This gives the following Feynman-Kac 
formula for u x (t,x), which can consequently be computed at any (t,y) G[0,T]xM. d 
through Monte-Carlo evaluations: 



u x (t,y)=E[g\X^)- / f x ( s ,X x ^)d s , 



(6.2) 



where (X^ to ' v ) to <t<T is the solution to (1.2 1 with initial condition X^ ta ' v = y. Differ- 
entiating ( |6.2| (provided f x and g x are differentiable) , we even directly get a Feynman- 
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\,t ,y . 



Kac formula for Vu x (t,y): 



V 5 A (X A <^) - jT • V/ A (s,X A ^)cfe) (6.3) 



where the stochastic processes ($^*'»,se[t,T]) in R dxd satisfy the first-order varia- 
tion of the SDE (1.2 1 with respect to the initial condition, that is 
for any s€ [<,T]: 



V„X A >^ 



$ A ^ = ld d + J' $ X /' V • V&V, X^ y )ds' + J° $ X /' V • Vct a (s', X^ y )dB s ,, 



(6.4) 



where Id d denotes the d x d identity matrix (see [21] for a more general and rigorous 
presentation of this Feynman-Kac formula in terms of the Malliavin gradient). The 
stochastic integral ( |2.9[ l can then be computed for each realization of (B t ), after 
discretizing ($ A '*^,sG [t,T]). 



Discrete approximations of the Feynman-Kac formula (6.3 1 have already been 



used succesfully in the context of computing control variates for the reduction of vari- 
ance, in [21] for instance. Note that this numerical strategy to compute Vu A from a 
Feynman-Kac formula requires a lot of computations. Yet, most often, the computa- 
tion time of the functions (t,y) — > Vit A (t,y) would not be a major issue in a reduced- 
basis approach, since this would be done offline (that is, in a pre-computation step, 
once for all) for only a few selected values of the parameter A. What is nevertheless 
necessary for the reduced-basis approach to work is the possibility to store the big 
amount of data corresponding to a discretization of Vu x (t,y) on a grid for the vari- 
able (t,y) £ [0,T] x R d , (the parameter A then assuming only a few values in A — of 
order 10 in our numerical experiments — ), and to have rapid access to those data 
in the online stage (where control variates are computed for any AG A using those 
precomputed data). 

Appendix B. Proof of Proposition |3.3[ 



First note that, since E (Y ) — 0, then Var(Z A ) = Var(Y^ 
So, for all As A and for every linear combination of Y ™ , 

N J 
n=l j=l 



,N : 



(with any choice a n (A)€K, A„ €A, n = l,...,N), there holds (recall that the Yj, 
j = 1 , . . . , J, are uncorrelated) : 

Var (Z x - Y N ) = Var (Y x -Y N ) 



(7.1) 



EM A )-£ a »(%i( A n)V; 

j=l V n=l / 



' _ \ k( A )-£»=i a «(%i( A ") 

^ I El^(A)| 2 Var(^-) sup 



J =1 



i<j<J 



To get ( pU5) , we now explain how to choose the N coefficients a„(A), l<n<N, 
for each AG A when A^ GA, n=l,...,N is given, and then how to choose those N 
parameter values A^ € A, n = 1 , . . . , N. 
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Assume the N parameter values A„ gA, n=l,...,N, are given, with Aq =A m j n , 
A$ = A majc and A^<A^ +1 , n = 0,...,N-l. Then, for a given Me{2,...,iV} (to be 
determined later on) and for all A € A, it is possible to choose 1 < Mo (A) < N such that 
^Mo(x) — ^ — ^MoM+M-v C n ly the M coefficients corresponding to the M contiguous 
parameter values above are taken non zero, such that VA S A : 

a m (A) ^ Q & M (A) < m < M ( A) + M - 1 , 

and are more specifically chosen as a m (A) = P a ,(ta(A)) where P^ are polynomi- 
als of degree M-l, such that, for all M (A) <m,k< M (A) +M- 1, P^(r A (A fe )) = 
5 m fe. The polynomial function P^ is the Lagrange interpolant defined on 
[ta(Xm (X)),ta(X m „(X)+m-i)}, taking value 1 at r A (A m ) and at t a (A/ £ ), fc^rri. 
We will also need a function d(X) = |t(Am (A)) ~~ t (Aa/ (A)+m-i)|- Using a Taylor- 
Lagrange formula for ffjor - 1 , we have (for some 0<ry< 1) : 

ft(A)-E a ™( A )ft(A n ) = ^ r fe°r- 1 )W (WA^ o(A) ) + (l-r 7 )x(A^ /o(A)+ , / _ 1 )) . 



Then, using ( |3.14| and the fact that Var (Z A ) = X^ =i |#j(A)| 2 Var (Y,), there exists a 



constant C>0 (independent of A and J) such that: 

Var(Z A -rA,) <Var(Z A ) (Cd(A)) 2M , VAe A. (7.2) 
Finally, to get the result, we now choose a TA-equidistributed parameter sample : 

n — 1 

t a(A„ ) = T A (A min )+ (rA(A max )-r A (A m i n )) , n=l,...,N. 

Then, d(X)= A ^Zi (TA(A max ) — TA(A m i n )) does not depend on A. Minimizing 
(Cd) d as a function of dG(0,^), we choose d(A)=^, and the choice M = 1 + 
L^j^-^ — ^Z^ a (a ) J (where [x\ denotes the integer part of a real number iel) 
finishes the proof provided N > N = 1 + [Ce (rA(A max ) — TA(A m i n ))J . □ 
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